Proactive Learning for Building Machine Translation Systems for Minority Languages
نویسندگان
چکیده
Building machine translation (MT) for many minority languages in the world is a serious challenge. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, it becomes very important for an MT system to make best use of its resources, both labeled and unlabeled, in building a quality system. In this paper we argue that traditional active learning setup may not be the right fit for seeking annotations required for building a Syntax Based MT system for minority languages. We posit that a relatively new variant of active learning, Proactive Learning, is more suitable for this task.
منابع مشابه
Linguistic Structure and Bilingual Informants Help Induce Machine Translation of Lesser-Resourced Languages
Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملBuilding Language Resources and Translation Models for Machine Translation Focused on South Slavic and Balkan Languages
The aim of this short-term project was to investigate the feasibility of machine translation (MT) research and development for several South Slavic and Balkan languages, more precisely Romanian, Bulgarian, Slovene, Greek and Serbian. For these languages, MT systems are scarce and for some of them even non-existent. We provide a brief description of the project’s major research tasks: Compilatio...
متن کاملEuskoParl: a speech and text Spanish-Basque parallel corpus
The advances in corpus-based approaches and machine learning techniques have promoted the development of minority languages. The contribution of this work is to acquire a parallel corpus in Spanish and Basque with both text and speech data. In order to be able to compare the systems with those developed for other languages, Europarl corpus was taken as a reference in both domain and size. The a...
متن کاملRule-based Breton to French machine translation
This paper describes a rule-based machine translation system from Breton to French intended for producing gisting translations. The paper presents a summary of the ongoing development of the system, along with an evaluation of two versions, and some reflection on the use of MT systems for lesser-resourced or minority languages.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009